Maximum-score diversity selection

نویسنده

  • Thorsten Meinl
چکیده

This thesis discusses the problem of Maximum-Score Diversity Selection (MSDS). Pure diversity selection, as it is often performed e.g. in early drug discovery, is the selection of a subset of available objects that is as diverse as possible. MSDS adds a second objective, which additionally tries to maximize the “score” of the subset, which usually is the sum of scores of all elements in the subset. Thus, this problem is a classical multi-objective optimization problem since both objectives – maximizing score and maximizing diversity – tend to conflict with each other. In this thesis several methods are presented, developed, and evaluated to efficiently solve this special multi-objective optimization problem. After a more detailed discussion about the application of MSDS in drug discovery, the question of suitable definitions of diversity is considered. This is essential for later application domains, where users have only a vague feeling of diversity. Then the Maximum-Score Diversity Selection problem is formalized and shown to be an NP-hard optimization problem. Therefore no exact solution can be computed efficiently for all but the smallest cases. After putting MSDS into the context of multi-objective optimization, the usage of evolutionary algorithms – specifically genetic algorithms – for solving the problem is evaluated. This also includes the presentation of novel genetic operators for evolving subsets or combinations of objects. However, being a universal tool, genetic algorithms may not be the best technique for the actual problem. Hence, several problem-specific heuristics are discussed, two of them motivated by the transformation of MSDS into a graph-theoretic problem used in the NP-hardness proof, and a novel heuristics methods, known as Score Erosion. The comparison of all approaches on various synthetic and realworld datasets reveals that all heuristics find solutions of similar quality, given the right measures of diversity, with Score Erosion being the fastest of all presented algorithm as a result of its linear time complexity. Also the questions are investigated as to how the structure of the search space influences the results and whether the application of MSDS pays off in practice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum-score diversity selection for early drug discovery

Diversity selection is a common task in early drug discovery. One drawback of current approaches is that usually only the structural diversity is taken into account, therefore, activity information is ignored. In this article, we present a modified version of diversity selection, which we term Maximum-Score Diversity Selection, that additionally takes the estimated or predicted activities of th...

متن کامل

تعیین تنوع مواد غذایی مصرفی و ارتباط آن با کفایت دریافت مواد مغذی در یک منطقه شهری تهران

Background: The present study was conducted to determine the dietary diversity score and its association to other measure of diet quality, including mean adequacy ratio. Materials and methods: After excluding the underreported subjects, 581 individuals over 18 years old (295 women & 286 men) were selected. Their dietary intake assessments were based on 2 day 25 hour food recall interview. A di...

متن کامل

Feature Selection by Maximum Marginal Diversity

We address the question of feature selection in the context of visual recognition. It is shown that, besides efficient from a computational standpoint, the infomax principle is nearly optimal in the minimum Bayes error sense. The concept of marginal diversity is introduced, leading to a generic principle for feature selection (the principle of maximum marginal diversity) of extreme computationa...

متن کامل

Evaluation of Morphological and Pomological Diversity of 62 Almond Cultivars and Superior Genotypes in Iran

Identification and selection of promising genotypes of fruit tree are primary steps in breeding programs. The economic importance of almond production in the world has stimulated numerous studies related to breeding, quantitative and qualitative traits, the increase of yield and decrease production costs. In this study, morphological and pomological characteristics of 60 cultivar and superior g...

متن کامل

Dual-Mode Antenna Selection for Spatial Multiplexing Systems with Linear Receivers

Wireless links with multi-antenna transmitters and receivers can be used to provide increased diversity and/or large data-rates. It has been shown that multi-antenna communication systems can simultaneously achieve the maximum diversity and maximum rate (or multiplexing) gain. This paper proposes a modified version of traditional spatial multiplexing that allows the wireless system to obtain ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010